Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Sep 5, 2025

Summary

This PR addresses Issue #7702 by implementing configurable batch size limits for embedding models, specifically to support Aliyun Bailian's Qwen3-Embedding models which have a maximum batch size of 10 items.

Problem

The Qwen3-Embedding and text-embedding-v4 models from Aliyun Bailian have a strict batch size limit of 10 items per request. When indexing codebases with more than 10 chunks, the embedding API returns an error:

HTTP 400 - Value error, batch size is invalid, it should not be larger than 10

Solution

  1. Added batch size configuration to embedding model profiles

    • Added maxBatchSize property to the EmbeddingModelProfile interface
    • Configured Qwen3-Embedding and text-embedding-v4 models with a 10-item batch limit
  2. Updated OpenAICompatibleEmbedder to respect batch limits

    • Modified the batching logic to check both token limits and item count limits
    • Ensures batches never exceed the model-specific maximum batch size
  3. Updated service factory to propagate batch limits

    • Passes model-specific batch size limits to embedders
    • Applies limits to scanner and file-watcher components

Testing

  • Added comprehensive test suite for batch size limiting functionality
  • All existing tests pass without regression
  • New tests verify:
    • Model-specific batch size limits are respected
    • Batching works correctly with mixed text sizes
    • Aliyun Bailian models are properly limited to 10 items per batch

Impact

  • Backward compatible: Models without batch size limits continue to work as before
  • Configurable: Easy to add batch size limits for other models in the future
  • Tested: Comprehensive test coverage ensures reliability

Fixes #7702


Important

Adds batch size limits for embedding models to support Aliyun Bailian, updating embedders and service factory to respect these limits.

  • Behavior:
    • Adds maxBatchSize to EmbeddingModelProfile in embeddingModels.ts for model-specific batch size limits.
    • Updates OpenAICompatibleEmbedder in openai-compatible.ts to respect maxBatchSize.
    • Updates service-factory.ts to propagate batch size limits to embedders, scanners, and file-watchers.
  • Testing:
    • Adds openai-compatible-batch-limit.spec.ts to test batch size limiting functionality.
    • Tests ensure model-specific batch size limits are respected and batching works with mixed text sizes.
  • Misc:
    • Updates service-factory.spec.ts to mock getModelMaxBatchSize and test embedder creation with batch size limits.

This description was created by Ellipsis for b5b90b0. You can customize this summary. It will automatically update as commits are pushed.

- Add maxBatchSize property to EmbeddingModelProfile interface
- Add batch size limits for Aliyun Bailian models (qwen3-embedding, text-embedding-v4)
- Update OpenAICompatibleEmbedder to respect model-specific batch limits
- Update service factory to pass batch size limits to embedders and processors
- Add comprehensive tests for batch size limiting functionality

Fixes #7702
@roomote roomote bot requested review from cte, jr and mrubens as code owners September 5, 2025 12:05
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Sep 5, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.

if (modelMaxBatchSize && modelMaxBatchSize < batchSize) {
batchSize = modelMaxBatchSize
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice there's duplicate logic here between createDirectoryScanner (lines 182-188) and createFileWatcher (lines 216-222) for checking model-specific batch size limits. Could we extract this into a helper method like getEffectiveBatchSize() to avoid the duplication?

},
// Aliyun Bailian models with batch size limits
"qwen3-embedding": { dimension: 1536, scoreThreshold: 0.4, maxBatchSize: 10 },
"text-embedding-v4": { dimension: 1536, scoreThreshold: 0.4, maxBatchSize: 10 },
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other Aliyun Bailian models that might need similar batch size limits? Currently we only have qwen3-embedding and text-embedding-v4 configured. It might be worth checking their documentation for other models that could benefit from this.

dimension: number
scoreThreshold?: number // Model-specific minimum score threshold for semantic search
queryPrefix?: string // Optional prefix required by the model for queries
maxBatchSize?: number // Maximum number of items that can be sent in a single batch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding more detailed documentation here to explain when and why this limit is needed. For example: 'Maximum number of items that can be sent in a single batch. Some providers (e.g., Aliyun Bailian) impose strict batch size limits on their embedding APIs.'

expect(mockEmbeddingsCreate.mock.calls[0][0].input).toHaveLength(10)
expect(result.embeddings).toHaveLength(10)
})
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comprehensive test coverage! Consider adding one more edge case test: what happens when a single text item exceeds both the token limit AND we have a batch size limit? This would ensure the warning is still logged correctly and the item is skipped as expected.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 5, 2025
@daniel-lxs
Copy link
Member

Closing, see #7702 (comment)

@daniel-lxs daniel-lxs closed this Sep 5, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 5, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Sep 5, 2025
@daniel-lxs daniel-lxs deleted the fix/qwen-embedding-batch-size-limit branch September 5, 2025 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Code Index has no limit on batch size

4 participants